Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics
نویسندگان
چکیده
Human-Object Interaction (HOI) detection is an essential task to understand human-centric images from a fine-grained perspective. Although end-to-end HOI models thrive, their paradigm of parallel human/object and verb class prediction loses two-stage methods' merit: object-guided hierarchy. The object in one triplet gives direct clues the be predicted. In this paper, we aim boost with statistical priors. Specifically, We propose utilize Verb Semantic Model (VSM) use semantic aggregation profit Similarity KL (SKL) loss proposed optimize VSM align dataset's To overcome static embedding problem, generate cross-modality-aware visual features by Cross-Modal Calibration (CMC). above modules combined composes Object-guided Cross-modal Network (OCN). Experiments conducted on two popular benchmarks demonstrate significance incorporating prior knowledge produce state-of-the-art performances. More detailed analysis indicates serve as stronger predictor more superior method utilizing knowledge. codes are available at https://github.com/JacobYuan7/OCN-HOI-Benchmark.
منابع مشابه
Detecting and Recognizing Human-Object Interactions
To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem. In this paper, we address the task of detecting 〈human, verb, object〉 triplets in challenging everyday photos. We propose a novel mod...
متن کاملCross-Modal Object Recognition Is Viewpoint-Independent
BACKGROUND Previous research suggests that visual and haptic object recognition are viewpoint-dependent both within- and cross-modally. However, this conclusion may not be generally valid as it was reached using objects oriented along their extended y-axis, resulting in differential surface processing in vision and touch. In the present study, we removed this differential by presenting objects ...
متن کاملHuman concerned object detecting in video
The purpose of our work is to detect the target human concerned in video. For security considerations, event detection in video has potential economic and social needs. Human concerned object detecting is very helpful for event detection. In some emergency or special events, people will focus on specific object. We need locate human body and face, detect the sight direction, and determine the o...
متن کاملModal Object Diagrams
While object diagrams (ODs) are widely used as a means to document object-oriented systems, they are expressively weak, as they are limited to describe specific possible snapshots of the system at hand. In this paper we introduce modal object diagrams (MODs), which extend the classical OD language with positive/negative and example/invariant modalities. The extended language allows the designer...
متن کاملDeclarative Semantics in Object-Oriented Software Development - A Taxonomy and Survey
One of the modern paradigms to develop an application is object oriented analysis and design. In this paradigm, there are several objects and each object plays some specific roles in applications. In an application, we must distinguish between procedural semantics and declarative semantics for their implementation in a specific programming language. For the procedural semantics, we can write a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i3.20229